Durational modelling for improved connected digit recognition

نویسنده

  • Kevin Power
چکیده

A durational modelling technique is proposed for CDHMM-based connected digit recognition. This reduces the insertion error rate, which is typically the most frequent recognition error observed when no grammar constraint is applied. Insertion errors can be attributed in part to the acknowledged weakness of the acoustic models for accurate temporal modeling of speech signals. Two forms of durational model are investigated: an expanded-state model and an explicit model. Both forms of model significantly reduce the number of insertion errors and hence the digit string error rate. A modification to the explicit model which also accounts for speaking rate is described.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modelling phonetic context using head-body-tail models for connected digit recognition

Both whole word modelling and context modelling have proven to improve recognition performance for connected digit strings. In this paper we will show that word boundary variation can be effectively modelled by applying the Head-Body-Tail (HBT) method as proposed by Chou et al in [1] and also applied by Gandhi in [2]. Each digit is split into three parts, representing the beginning, middle and ...

متن کامل

Connected Digit Recognition with Class Specific Word Models

This work focuses on efficient use of the training material by selecting the optimal set of model topologies. We do this by training multiple word models of each word class, based on a subclassification according to a priori knowledge of the training material. We will examine classification criteria with respect to duration of the word, gender of the speaker, position of the word in the utteran...

متن کامل

Context-dependent word duration modelling for robust speech recognition

Conventional hidden Markov models (HMMs) have weak duration constraints. This may cause the decoder to produce word matches with unrealistic durations in noisy situations. This paper describes techniques for modelling context-dependent word duration cues and incorporating them directly in a multi-stack decoding algorithm. The proposed model is capable of penalising duration constraints of a wor...

متن کامل

An embedded word training procedure for connected digit recognition

The "conventional" way of obtaining word reference patterns for connected word recognition systems is to use isolatàd word patterns, and to rely on the dynamics of the matching algorithm to account for the differences in connected speech. Connected word recognition, based on such an approach, tends to become unreliable (high error rates) when the talking rate becomes grossly incommensurate with...

متن کامل

Suprasegmental duration modelling with elastic constraints in automatic speech recognition

In this paper a method of integrating a model of suprasegmental duration with a HMM-based recogniser at the post-processing level is presented. The N-Best utterance output is rescored using a suitable linear combination of acoustic log-likelihood (provided by a set of tied-state triphone HMMs) and duration log-likelihood (provided by a set of durational models). The durational model used in the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996